Aprendizaje automático para el procesamiento de señales e imágenes médicas

Ingeniería Biomédica

Ph.D. Pablo Eduardo Caicedo Rodríguez

2026-01-19

What is Convolution?

  • Convolution: A mathematical operation used to extract features from input data.
  • Filter/Kernels:
    • A small matrix (e.g., 3x3) that slides over the input.
    • Detects patterns such as edges, textures, and colors.
  • Stride: Number of pixels by which the filter moves at each step.
  • Padding: Adds extra pixels around the border of the input, preserving spatial dimensions.

Convolution in Action

  • Input: A matrix of pixel values (e.g., an image).
  • Output (Feature Map): A matrix where each value represents the result of applying the filter over a region of the input.

What are CNNs?

  • Definition: CNNs are deep learning models primarily used for visual recognition tasks.
  • Key Concept: CNNs learn and detect hierarchical patterns in image data (e.g., edges, shapes, textures).
  • Importance: Automatically extract features, reducing the need for manual feature engineering.

Why CNNs?

  • Fully Connected Networks struggle with large images due to high dimensionality.
  • CNNs reduce the number of parameters by using local connectivity (convolutions) and weight sharing.
  • Efficient in Learning: They exploit spatial hierarchies in images.

CNN Architecture Overview

  • Input Layer: Raw image data (e.g., 28x28 pixels for MNIST).
  • Convolutional Layer: Detects features from input images using filters.
  • Activation Function: Typically ReLU to introduce non-linearity.
  • Pooling Layer: Reduces the spatial dimensions (downsampling).
  • Fully Connected Layer: Performs classification based on extracted features.

Activation Function (ReLU)

  • Purpose: Introduce non-linearity into the network, allowing CNNs to learn complex patterns.
  • ReLU Formula: ( f(x) = (0, x) )
  • Why ReLU?:
    • Faster convergence compared to sigmoid or tanh.
    • Avoids the vanishing gradient problem.

Pooling Layers

  • Purpose: Reduce the spatial dimensions of feature maps, decrease computational load, and control overfitting.
  • Types of Pooling:
    • Max Pooling: Selects the maximum value within a specified window.
    • Average Pooling: Calculates the average value within a specified window.
  • Benefits:
    • Retains the most important features (Max Pooling).
    • Smooths the feature maps (Average Pooling).
  • Common Parameters:
    • Kernel Size: Size of the window (e.g., 2x2).
    • Stride: Step size for moving the window.

Fully Connected Layers

  • Flattening: Converts the 2D feature maps into a 1D vector for input into fully connected layers.
  • Fully Connected (Dense) Layers: Every neuron in the previous layer is connected to every neuron in the next layer.
  • Role: Performs classification based on features learned from convolution and pooling layers.

Training CNNs

  • Loss Function: Cross-entropy loss is commonly used for classification tasks.
  • Optimization: Backpropagation combined with optimizers like stochastic gradient descent (SGD) or Adam.
  • Training Concepts:
    • Epochs: Number of complete passes over the dataset.
    • Mini-batches: Small subsets of the dataset used in each iteration.

Challenges

  • Computational Resources: CNNs require powerful hardware (e.g., GPUs) for training large models.
  • Large Datasets: CNNs often need vast amounts of labeled data to perform well.
  • Overfitting: Common problem in CNNs when trained on small datasets. Solutions include:
    • Data augmentation (rotating, flipping, or zooming images).
    • Dropout layers to randomly drop neurons during training.

Future of CNNs

  • Advanced Architectures:
    • Residual Networks (ResNet):
      • Deeper networks can be trained by using skip connections to bypass layers and avoid the vanishing gradient problem.
    • Inception Networks:
      • Utilize multiple filters of different sizes in parallel to capture features at different scales.
    • EfficientNet:
      • Balances network depth, width, and resolution, creating more efficient models with fewer parameters while maintaining accuracy.